Machine Learned Sentence Selection Strategies for Query-Biased Summarization

نویسندگان

  • Donald Metzler
  • Tapas Kanungo
چکیده

It has become standard for search engines to augment result lists with document summaries. Each document summary consists of a title, abstract, and a URL. In this work, we focus on the task of selecting relevant sentences for inclusion in the abstract. In particular, we investigate how machine learning-based approaches can effectively be applied to the problem. We analyze and evaluate several learning to rank approaches, such as ranking support vector machines (SVMs), support vector regression (SVR), and gradient boosted decision trees (GBDTs). Our work is the first to evaluate SVR and GBDTs for the sentence selection task. Using standard TREC test collections, we rigorously evaluate various aspects of the sentence selection problem. Our results show that the effectiveness of the machine learning approaches varies across collections with different characteristics. Furthermore, the results show that GBDTs provide a robust and powerful framework for the sentence selection task and significantly outperform SVR and ranking SVMs on several data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RMIT at the NTCIR-12 MobileClick-2: iUnit Ranking and Summarization Subtasks

[1] R-.C. Chen, D. Spina, W.B. Croft, M. Sanderson, and F. Scholer. Harnessing Semantics for Answer Sentence Retrieval. In Proceedings of ESAIR'15, 2015 [5] D. Metzler and T. Kanungo. Machine Learned Sentence Selection Strategies for QueryBiased Summarization. In Proceedings of SIGIR 2008 Learning to Rank Workshop, 2008 [7] L. Yang, Q. Ai, D. Spina, R.-C. Chen, L. Pang, W.B. Croft, J. Guo, and ...

متن کامل

Feature expansion for query-focused supervised sentence ranking

We present a supervised sentence ranking approach for use in extractive summarization. Using a general machine learning technique provides great flexibility for incorporating varied new features, which we demonstrate. The system proves quite effective at query-focused multi-document summarization, both for single summaries and for series of update summaries.

متن کامل

Semi-Supervised Co-Clustering for Query-Oriented Theme-based Summarization

Sentence clustering plays an important role in theme-based summarization which aims to discover the topical themes defined as the clusters of highly related sentences. However, due to the short length of sentences, the word-vector cosine similarity traditionally used for document clustering is no longer suitable. To alleviate this problem, we regard a word as an independent text object rather t...

متن کامل

Query Focus Guided Sentence Selection Strategy for DUC 2006

This paper presents our new query-based multi-document summarization system for DUC 2006. It is an extended version of a generic multi-document summarization system developed previously (namely PoluS 1.0) which incorporates latent semantic analysis (LSA) technology. To make the generated summaries satisfying user’s information need as possible as we can, we propose a query focus guided sentence...

متن کامل

Query-Based Summarization: A survey

This paper presents a survey of recent extractive query-based summarization techniques. We explore approaches for single document and multidocument summarization. Knowledge-based and machine learning methods for choosing the most relevant sentences from documents with respect to a given query are considered. Further, we expose tailored summarization techniques for particular domains like medica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008